152 research outputs found

    Transcription Factor-DNA Binding Via Machine Learning Ensembles

    Full text link
    We present ensemble methods in a machine learning (ML) framework combining predictions from five known motif/binding site exploration algorithms. For a given TF the ensemble starts with position weight matrices (PWM's) for the motif, collected from the component algorithms. Using dimension reduction, we identify significant PWM-based subspaces for analysis. Within each subspace a machine classifier is built for identifying the TF's gene (promoter) targets (Problem 1). These PWM-based subspaces form an ML-based sequence analysis tool. Problem 2 (finding binding motifs) is solved by agglomerating k-mer (string) feature PWM-based subspaces that stand out in identifying gene targets. We approach Problem 3 (binding sites) with a novel machine learning approach that uses promoter string features and ML importance scores in a classification algorithm locating binding sites across the genome. For target gene identification this method improves performance (measured by the F1 score) by about 10 percentage points over the (a) motif scanning method and (b) the coexpression-based association method. Top motif outperformed 5 component algorithms as well as two other common algorithms (BEST and DEME). For identifying individual binding sites on a benchmark cross species database (Tompa et al., 2005) we match the best performer without much human intervention. It also improved the performance on mammalian TFs. The ensemble can integrate orthogonal information from different weak learners (potentially using entirely different types of features) into a machine learner that can perform consistently better for more TFs. The TF gene target identification component (problem 1 above) is useful in constructing a transcriptional regulatory network from known TF-target associations. The ensemble is easily extendable to include more tools as well as future PWM-based information.Comment: 33 page

    Guest Editorial: Systems Biology, the Second Time Around

    Get PDF

    Phenotypic connections in surprising places

    Get PDF
    Connections have been revealed between very different human diseases using phenotype associations in other specie

    The society of genes: networks of functional links between genes from comparative genomics

    Get PDF
    BACKGROUND: Comparative genomics provides at least three methods beyond traditional sequence similarity for identifying functional links between genes: the examination of common phylogenetic distributions, the analysis of conserved proximity along the chromosomes of multiple genomes, and observations of fusions of genes into a multidomain gene in another organism. We have previously generated the links according to each of these methods individually for 43 known microbial genomes. Here we combine these results to construct networks of functional associations. RESULTS: We show that the functional networks obtained by applying these methods have different topologies and that the information they provide is largely additive. In particular, the combined networks of functional links contain an average of 57% of an organism's complete genetic complement, uncover substantial portions of known pathways, and suggest the function of previously unannotated genes. In addition, the combined networks are qualitatively different from the networks obtained using individual methods. They have a dominant cluster that contains approximately 80%-90% of the genes, independent of genome size, and the dominant clusters show the small world behavior expected of a biological system, with global connectivity that is nearly random, and local properties that are highly ordered. CONCLUSIONS: When the information on functional linkage provided by three emerging computational methods is combined, the integrated network uncovers large numbers of conserved pathways and identifies clusters of functionally related genes. It therefore shows considerable utility and promise as a tool for understanding genomic structure, and for guiding high throughput experimental investigations

    The interaction map of yeast: terra incognita?

    Get PDF
    A systematic curation of the literature on Saccharomyces cerevisiae has yielded a comprehensive collection of experimentally observed interactions. This new resource augments current views of the topological structure of yeast's physical and genetic networks, but also reveals that existing studies cover only a fraction of the cell

    Visualization of metabolic interaction networks in microbial communities using VisANT 5.0

    Get PDF
    The complexity of metabolic networks in microbial communities poses an unresolved visualization and interpretation challenge. We address this challenge in the newly expanded version of a software tool for the analysis of biological networks, VisANT 5.0. We focus in particular on facilitating the visual exploration of metabolic interaction between microbes in a community, e.g. as predicted by COMETS (Computation of Microbial Ecosystems in Time and Space), a dynamic stoichiometric modeling framework. Using VisANT's unique metagraph implementation, we show how one can use VisANT 5.0 to explore different time-dependent ecosystem-level metabolic networks. In particular, we analyze the metabolic interaction network between two bacteria previously shown to display an obligate cross-feeding interdependency. In addition, we illustrate how a putative minimal gut microbiome community could be represented in our framework, making it possible to highlight interactions across multiple coexisting species. We envisage that the "symbiotic layout" of VisANT can be employed as a general tool for the analysis of metabolism in complex microbial communities as well as heterogeneous human tissues.This work was supported by the National Institutes of Health, R01GM103502-05 to CD, ZH and DS. Partial support was also provided by grants from the Office of Science (BER), U.S. Department of Energy (DE-SC0004962), the Joslin Diabetes Center (Pilot & Feasibility grant P30 DK036836), the Army Research Office under MURI award W911NF-12-1-0390, National Institutes of Health (1RC2GM092602-01, R01GM089978 and 5R01DE024468), NSF (1457695), and Defense Advanced Research Projects Agency Biological Technologies Office (BTO), Program: Biological Robustness In Complex Settings (BRICS), Purchase Request No. HR0011515303, Program Code: TRS-0 Issued by DARPA/CMO under Contract No. HR0011-15-C-0091. Funding for open access charge: National Institutes of Health. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. (R01GM103502-05 - National Institutes of Health; 1RC2GM092602-01 - National Institutes of Health; R01GM089978 - National Institutes of Health; 5R01DE024468 - National Institutes of Health; DE-SC0004962 - Office of Science (BER), U.S. Department of Energy; P30 DK036836 - Joslin Diabetes Center; W911NF-12-1-0390 - Army Research Office under MURI; 1457695 - NSF; HR0011515303 - Defense Advanced Research Projects Agency Biological Technologies Office (BTO), Program: Biological Robustness In Complex Settings (BRICS); HR0011-15-C-0091 - DARPA/CMO; National Institutes of Health)Published versio

    Classifying transcription factor targets and discovering relevant biological features

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>An important goal in post-genomic research is discovering the network of interactions between transcription factors (TFs) and the genes they regulate. We have previously reported the development of a supervised-learning approach to TF target identification, and used it to predict targets of 104 transcription factors in yeast. We now include a new sequence conservation measure, expand our predictions to include 59 new TFs, introduce a web-server, and implement an improved ranking method to reveal the biological features contributing to regulation. The classifiers combine 8 genomic datasets covering a broad range of measurements including sequence conservation, sequence overrepresentation, gene expression, and DNA structural properties.</p> <p>Principal Findings</p> <p>(1) Application of the method yields an amplification of information about yeast regulators. The ratio of total targets to previously known targets is greater than 2 for 11 TFs, with several having larger gains: Ash1(4), Ino2(2.6), Yaf1(2.4), and Yap6(2.4).</p> <p>(2) Many predicted targets for TFs match well with the known biology of their regulators. As a case study we discuss the regulator Swi6, presenting evidence that it may be important in the DNA damage response, and that the previously uncharacterized gene YMR279C plays a role in DNA damage response and perhaps in cell-cycle progression.</p> <p>(3) A procedure based on recursive-feature-elimination is able to uncover from the large initial data sets those features that best distinguish targets for any TF, providing clues relevant to its biology. An analysis of Swi6 suggests a possible role in lipid metabolism, and more specifically in metabolism of ceramide, a bioactive lipid currently being investigated for anti-cancer properties.</p> <p>(4) An analysis of global network properties highlights the transcriptional network hubs; the factors which control the most genes and the genes which are bound by the largest set of regulators. Cell-cycle and growth related regulators dominate the former; genes involved in carbon metabolism and energy generation dominate the latter.</p> <p>Conclusion</p> <p>Postprocessing of regulatory-classifier results can provide high quality predictions, and feature ranking strategies can deliver insight into the regulatory functions of TFs. Predictions are available at an online web-server, including the full transcriptional network, which can be analyzed using VisAnt network analysis suite.</p> <p>Reviewers</p> <p>This article was reviewed by Igor Jouline, Todd Mockler(nominated by Valerian Dolja), and Sandor Pongor.</p

    In silico regulatory analysis for exploring human disease progression

    Get PDF
    © 2008 Holloway et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution Licens

    Transcription factor-DNA binding via machine learning ensembles

    Full text link
    The network of interactions between transcription factors (TFs) and their regulatory gene targets governs many of the behaviors and responses of cells. Construction of a transcriptional regulatory network involves three interrelated problems, defined for any regulator: finding (1) its target genes, (2) its binding motif and (3) its DNA binding sites. Many tools have been developed in the last decade to solve these problems. However, performance of algorithms for these has not been consistent for all transcription factors. Because machine learning algorithms have shown advantages in integrating information of different types, we investigate a machine-based approach to integrating predictions from an ensemble of commonly used motif exploration algorithms.Published versio
    • …
    corecore